125 research outputs found
From Maxout to Channel-Out: Encoding Information on Sparse Pathways
Motivated by an important insight from neural science, we propose a new
framework for understanding the success of the recently proposed "maxout"
networks. The framework is based on encoding information on sparse pathways and
recognizing the correct pathway at inference time. Elaborating further on this
insight, we propose a novel deep network architecture, called "channel-out"
network, which takes a much better advantage of sparse pathway encoding. In
channel-out networks, pathways are not only formed a posteriori, but they are
also actively selected according to the inference outputs from the lower
layers. From a mathematical perspective, channel-out networks can represent a
wider class of piece-wise continuous functions, thereby endowing the network
with more expressive power than that of maxout networks. We test our
channel-out networks on several well-known image classification benchmarks,
setting new state-of-the-art performance on CIFAR-100 and STL-10, which
represent some of the "harder" image classification benchmarks.Comment: 10 pages including the appendix, 9 figure
On the Difficulty of Manhattan Channel Routing
We show that channel routing in the Manhattan model remains difficult even when all nets are single-sided. Given a set of n single-sided nets, we consider the problem of determining the minimum number of tracks required to obtain a dogleg-free routing. In addition to showing that the decision version of the problem isNP-complete, we show that there are problems requiring at least d+Omega(sqrt(n)) tracks, where d is the density. This existential lower bound does not follow from any of the known lower bounds in the literature
Constructing Inverted Files: To MapReduce or Not Revisited
Current high-throughput algorithms for constructing inverted files all
follow the MapReduce framework, which presents a high-level programming
model that hides the complexities of parallel programming. In this
paper, we take an alternative approach and develop a novel strategy that
exploits the current and emerging architectures of multicore processors.
Our algorithm is based on a high-throughput pipelined strategy that
produces parallel parsed streams, which are immediately consumed at the
same rate by parallel indexers. We have performed extensive tests of our
algorithm on a cluster of 32 nodes, and were able to achieve a
throughput close to the peak throughput of the I/O system: a throughput
of 280 MB/s on a single node and a throughput that ranges between 5.15
GB/s (1 Gb/s Ethernet interconnect) and 6.12GB/s (10Gb/s InfiniBand
interconnect) on a cluster with 32 nodes for processing the ClueWeb09
dataset. Such a performance represents a substantial gain over the best
known MapReduce algorithms even when comparing the single node
performance of our algorithm to MapReduce algorithms running on large
clusters. Our results shed a light on the extent of the performance
cost that may be incurred by using the simpler, higher-level MapReduce
programming model for large scale applications
Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA
We present a number of optimization techniques to compute prefix sums on linked lists and implement them on multithreaded GPUs using CUDA. Prefix computations on linked structures involve in general highly irregular fine grain memory accesses that are typical of many computations on linked lists, trees, and graphs. While the current generation of GPUs provides substantial computational power and extremely high bandwidth memory accesses, they may appear at first to be primarily geared toward streamed, highly data parallel computations. In this paper, we introduce an optimized multithreaded GPU algorithm for prefix computations through a randomization process that reduces the problem to a large number of fine-grain computations. We map these fine-grain computations onto multithreaded GPUs in such a way that the processing cost per element is shown to be close to the best possible. Our experimental results show scalability for list sizes ranging from 1M nodes to 256M nodes, and significantly improve on the recently published parallel implementations of list ranking, including implementations on the Cell Processor, the MTA-8, and the NVIDIA GeForce 200 series. They also compare favorably to the performance of the best known CUDA algorithm for the scan operation on the Tesla C1060
Archiving Temporal Web Information: Organization of Web Contents for Fast Access and Compact Storage
We address the problem of archiving dynamic web contents over
significant time spans. Current schemes crawl the web contents at
regular time intervals and archive the contents after each crawl
regardless of whether or not the contents have changed between
consecutive crawls. Our goal is to store newly crawled web contents
only when they are different than the previous crawl, while ensuring
accurate and quick retrieval of archived contents based on arbitrary
temporal queries over the archived time period. In this paper, we
develop a scheme that stores unique temporal web contents in
containers following the widely used ARC/WARC format, and that
provides quick access to the archived contents for arbitrary temporal
queries. A novel component of our scheme is the use of a new indexing
structure based on the concept of persistent or multi-version data
structures. Our scheme can be shown to be asymptotically optimal both
in storage utilization and insert/retrieval time. We illustrate the
performance of our method on two very different data sets from the
Stanford WebBase project, the first reflecting very dynamic web
contents and the second relatively static web contents. The
experimental results clearly illustrate the substantial storage
savings achieved by eliminating duplicate contents detected between
consecutive crawls, as well as the speed at which our method can find
the archived contents specified through arbitrary temporal queries
Web Archiving: Organizing Web Objects into Web Containers to Optimize Access
The web is becoming the preferred medium for communicating and storing
information pertaining to almost any human activity. However it is an
ephemeral medium whose contents are constantly changing, resulting in
a permanent loss of part of our cultural and scientific heritage on a
regular basis. Archiving important web contents is a very challenging
technical problem due to its tremendous scale and complex structure,
extremely dynamic nature, and its rich heterogeneous and deep
contents. In this paper, we consider the problem of archiving a linked
set of web objects into web containers in such a way as to minimize
the number of containers accessed during a typical browsing session.
We develop a method that makes use of the notion of PageRank and
optimized graph partitioning to enable faster browsing of archived web
contents. We include simulation results that illustrate the
performance of our scheme and compare it to the common scheme
currently used to organize web objects into web containers
- …